C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm
نویسندگان
چکیده
We present a time-optimal implementation for bulk execution of an oblivious sequential algorithm. Our second contribution is to develop a tool, named C2CU, which automatically generates a CUDA C program for a bulk execution of an oblivious sequential algorithm. C2CU: A CUDA C Program Generator for Bulk Execution
منابع مشابه
Parallelization of the Cuckoo Search Using CUDA Architecture
Cuckoo Search is one of the recent swarm itelligence metaheuritics. It has been succesfuly applied to a number of optimization problems, but is stil not very well researched. In this paper we present a parallelized version of the Cuckoo Search algorithm. The parallelization is implemented using CUDA architecture. The algorithm is significantly changed compared to the sequential version. Changes...
متن کاملموازی سازی شبیه سازی پدیده ناپایداری دوجریانی به روش PIC
Two stream instability in plasma is simulated by PIC method. The execution time of the sequential and parallizable sections of the program is measured. The sequential program is parallelized with the help of the MPI functions. Then, the execution time of the sequential program versus the number of the grid points and the execution time of the parallel program on 3 and 5 processors versus the nu...
متن کاملA Hybrid Approach to Parallel Connected Component Labeling Using CUDA
Connected component labeling (CCL) is a mandatory step in image segmentation where each object in an image is identified and uniquely labeled. Sequential CCL is a time-consuming operation and thus is often implemented within parallel processing framework to reduce execution time. Several parallel CCL methods have been proposed in the literature. Among them are NSZ label equivalence (NSZLE) meth...
متن کاملComparison of Parallel CUDA and OpenMP Implementations of Particle Swarm Optimization
Since the physical constraints on micro computing devices have forced the researchers to design next generation chips, the significance of the parallelization and distributed computing grow in importance. In this study, a sequential implementation of the Particle Swarm Optimization algorithm is converted into a concurrent version, which is executed on the cores of both CPU and GPU. For this rea...
متن کاملAnalysis of a Step-Based Watershed Algorithm Using CUDA
This paper proposes and develops a parallel algorithm for the watershed transform, with application on graphics hardware. The existing proposals are discussed and its aspects briefly analysed. The algorithm is proposed as a procedure of four steps, where each step performs a task using different approaches inspired by existing techniques. The algorithm is implemented using the CUDA libraries an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 29 شماره
صفحات -
تاریخ انتشار 2014